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• 1 — I Abstract 

!>■ > 

Q The theoretical limits of 'lossy' data compression algorithms are con- 

O sidered. The complexity of an object as seen by a macroscopic observer 

is the size of the perceptual code which discards all information that can 
be lost without altering the perception of the specified observer. The 
Y? . r~] complexity of this macroscopically observed state is the simplest descrip- 

O tion of any microstate comprising that macrostate. Inference and pattern 

"~^ recognition based on macrostate rather than microstate complexities will 

Cn ■ I take advantage of the complexity of the macroscopic observer to ignore 

irrelevant noise. 

> 

P£ ; CN The quantification of information 



Information theory in its modern form originated from Claude Shannon's [22] 
usage of Gibbs' entropy formula to describe communication channels: 



S^-kJ^P^'^ogP, (1) 
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t^ ) This formula originally applied to an ensemble of microscopic states ^ , and 

the analytic form of expected log-probability describes the entropy of systems 
and their representations whether classical or quantum in nature. In the context 
of quantum mechanics, it becomes the von Neumann entropy of the state density 
matrix, S = ~trace{plogp). The story goes that it was actually von Neumann 
who suggested the term 'entropy' to Shannon for his information function, for 
two reasons: Tn the first place your uncertainty function has been used in 
statistical mechanics under that name, so it already has a name. In the second 
place, and more important, nobody knows what entropy really is, so in a debate 
you will always have the advantage.' 

Entropy has the units of the logarithm of action P^ Shannon showed that, 
in the absence of Boltzmann's constant, fc, entropy quantifies the number of bits 
of data needed to identify a sample from some distribution. It quantifies the 
amount of choice or uncertainty that must be overcome in order to invoke the 
axiom of choice and select a specific element from a set. The Shannon Entropy, 
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H, limits the information capacity of a signal communicated using an alphabet 
or codebook with known distribution, P. 

H=-Y,P^^ogP, (2) 

Or, in the case of a continuous pdf, H ~ — J Pi^) logp{x)dx. The functional 
form of information entropy has the properties that one expects from a linear 
measure of choice[22] and, as such, establishes a theoretical bound for the aver- 
age information capacity of a string of symbols sent from an ergodic source to a 
receiver. The base of the logarithm is equal to the cardinality of the symbol set. 
When the base is two, the units of entropy are bits, in base e they are 'nats', 
etc. 

Shannon entropy represents choice or uncertainty in the space of possible 

T"' states; it is synonymous with the development of information theory [3]. How- 

Q ever, the definition of entropy due to Boltzmann is still more widely known for 

^ its role in the thermodynamics of physical systems [51 [THl HH IH [IHl [T71 [T3] than 

its role in the theory of information [SI [3S1 [T^. The definition of microscopic 

;Zl entropy (originally referred to by Boltzmann as 'molecular chaos') is 

^ S = klogn (3) 

I For a set having n discrete elements. This definition may be extended to 

r— H continuous spaces by considering n as a volume V of phase space, having some 

^ measure, /x, in such a case the entropy becomes S = k\ogn{V). 

^1 Boltzmann's thermodynamic entropy and Gibbs's statistical mechanical en- 

tropy are closely related. The Gibbs entropy of an ensemble reduces to the 
^ Boltzmann entropy in the case of an ensemble of n equally likely microscopic 

CjO states (or a volume n of state space) corresponding to a single maximum-entropy 

!— I equilibrium state. Alternately, Gibbs' entropy function may be shown to emerge 

IJT-' from Boltzmann's entropy as the increase in phase space volume corresponds to 

q"' the expected value of the log-probability. [13] 

r j The Boltzmann entropy of a non-equilibrium system (which could have any 

©number of partial equilibrium macrostates) is proportional to the sum of the 
entropies of each macrostate.[T3] The Boltzmann entropy of such a macrostate 
is the logarithm of the number of microscopic states consistent with the observed 
macroscopic state of the ensemble, or, equivalently, the volume of phase space 
occupied by that macroscopic state. 

Rather than Boltzmann's entropy function, which includes a constant to de- 
scribe the phase spaces of physical systems, we will refer to an abstract Boltz- 
mann entropy function on a discrete state space, 5* = log n which is simply the 
logarithm of the set cardinality n. If the set is continuous the form S — log fi{V) 
is implied. A discrete version of this abstract Boltzmann entropy takes the form 
of the Hartley information function log p. ^U\ 

Shannon entropy is defined for ergodic sources [22]. not particular instances 
of data. If a source is not ergodic then the results of classical information theory 
may hold only approximately, on average, or asymptotically. [15 . Colloquially, 
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this means that a series is statistically uniform throughout, which is related to 
the notion of a stationary stochastic process. For example, for a memoryless 
source such as a coin, 01101001111010101101 may seem like a typical sequence, 
but 11111111111100000000 seems, offliand, highly unlikely to have been pro- 
duced by the same random process. However, these sequences have the same 
number of Is and Os, so when viewed as an unordered set (rather than, for in- 
stance, a Markov chain) they have identical distributions with the same Shannon 
entropy. 

Kolmogorov Complexity resolves this difficulty by describing any type of 
binary symbolic information, regardless of the source. It is defined as the min- 
imum amount of information needed to completely reconstruct some object, 
represented as a binary string of symbols, X[3l[T5]. 

C/(^)= min bl (4) 

fip)=x 

In the parlance of computer science, f is a computer and p is a program 
running on that computer. The Kolmogorov Complexity is the length of the 
shortest computer program which terminates with X as output. In the example 
of the binary sequences above, the second clearly has a simpler algorithmic 
representation, whereas the first is nearly random. 

The Turing equivalence of different computers [24, relates Kolmogorov com- 
plexities by a equivalence constant [151 [3]: 

Cf{X)^C,iX) + C (5) 



, -, Since optimal specification does not depend on the particular computer used, 

JZi we will assume a standard computer f unless otherwise specified. 

_ ^J The properties of C{X) are sometimes more natural when the set of possible 

f— I X are constrained to be prefix-free, that is, no X is a prefix of another X, so 

r^-' programs are self-delimiting rather than being demarcated by stop symbols. In 

Q this case, we refer to Chaitin's algorithmic prefix complexity K{X). We won't 

f) delve into the details oi K{X), but we note that a program can be made self- 

© delimiting by recursively prefixing the value of its length, and the length of this 

prefix, and so forth, so K{X) = C(X) + C{C{X) +0{C{C{C{X))))^. K{X) 
gains some important attributes [151 [2] that C{X) lacks. One is convergence of 
the universal probability: 

U{x)^ J2 2-1^1 (6) 

f{p)=x 

This probability measure may be interpreted as the probability that a ran- 
domly selected prefix- free program terminates with x as its output. Convergence 
is assured by the use of a self-delimiting prefix code, as the Kraft inequality[3] 
states that the lengths of codewords x in a prefix code satisfy Yl '^^'"^^ ^ 1- 
Though convergence to the limit is in general very slow[15j. this series is often 
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dominated by the shortest program, and U{x) ~ 2~-^^^^ constitutes a reason- 
able first-order approximation [3l [15] to the universal probability for these typi- 
cal objects which are said to be 'shallow', whereas certain special 'deep' objects 
converge more slowly. 

Performing complexity-based inference requires that we have a notion of 
the information that one object contains about another, or vice versa. In the 
theory of Kolmogorov complexity [H [15], this is achieved by the conditional prefix 
complexity: 

K{A\B) = K{AB) - K{B) (7) 

Where the combination AB is the concatenation of strings A and B. Al- 
gorithmic prefix complexity characterizes measurements at finite precision and 
may also be defined via algorithmic entropy, which we will consider shortly. 
•—i The notion of splitting complexity into a 'regular' (low prior probability) 

^ part and a 'random' (high prior probability) part is also originally due to Kol- 

p mogorov. The Kolmologorov Minimal Sufficient Statistic identifies the smallest 

7- /-) superset of x which may be described with less than k bits. This is closely re- 

lated to the notion of stochastic processes used earlier by Langevin to separate 
;^ dynamical systems into deterministic and random components. 

The Kolmogorov complexity may be used to define stochastic sequences in 
general, as such, is fundamental to the notion of a statistical probability. [15] 
I For natural numbers k and (5, we say that a string x is {k, (5)-stochastic if and 
I only if there exists a finite set A such that: 

HZ! x^A, C{A) < k, C{x\A) > log\A\ - S (8) 

' ' The deviation from randomness, S, indicates whether x is a typical or atypical 

-+1:^ member of A. The Kolmologorov Minimal Sufficient Statistic for x, given n = 

■ 'i^U \x\, is the set of minimum cardinality subject to the first two constraints of 

• r"'"' stochasticity. This is defined through the Kolmologorov Structure Function, 

>-, Ck{x\n): 

P^ Ckix\n) = minjlog \A\ : x e A, C{A\n) < k} (9) 

The minimal set Aq minimizes the randomness deficiency, S, and is referred 
to as the Kolmologorov Minimal Sufficient Statistic for x given n. This gen- 
eralizes the notion of fitting a distribution to x. The Kolmogorov Structure 
Function Ck{x\n) measures the amount of randomness in the string x. For n 
coin tosses, it is nearly n, for a number such as tt, it is 0(1). 

Another fundamental partitioning of random and nonrandom data is pro- 
vided by the Algorithmic Entropy function [2S], introduced by Zurek as physical 
entropy as it generalizes classical thermodynamics to a physical theory of in- 
formation. Algorithmic Entropv|26[ I15j combines Kolmogorov complexity and 
Boltzmann entropy to measure the macroscopic complexity of certain types of 
measurements. It relates computation and the informatic content of real- valued 
measurements to statistical mechanics. The algorithmic entropy of a string, 
H(Z) (not to be confused with the Shannon Information, H, of a source) is 
defined in its most basic form as: 

H{Z) = K{Z) + S (10) 



© 



UJ 



In this context, Z = Xi-n is a description of a macroscopic observation 
constructed by truncating a microscopic state X to a bit string of length n. 
K(X) is the algorithmic prefix complexity[31 [TS] of this representation of the 
macrostate. In the case of algorithmic entropy, the Boltzmann entropy S is seen 
to be the additional complexity needed to specify a microstate given knowledge 
of its macrostate. 

Since all the microstates comprising a partition of macrostate share a com- 
mon prefix in their string representation, K(X) is constructed as the prefix 
complexity of these microstates. The microstates are contained in a volume of 
state space sharing a common prefix. Relaxing this constraint leads to a more 
general functional, the effective complexity. 

Gell-Mann and Lloyd [6] describe a procedure for determining 'Effective Com- 
plexity' which extends the principles of of maximum entropy[Tl] to complexity 
■ r"' theory. The total information functional E is defined as the sum of the Shannon 

Q information of an ensemble Z, of which the string X is a member, and another 

^ argument, the effective complexity, Y, the K-complexity of this ensemble. By 

minimizing total information subject to arbitrary constraints f{X) = c, which 
;Zl incorporate any prior information known about the system, the most hkely con- 

■-^-H figuration of the ensemble may be determined. 

o 

'-^ Y. = Y + H{Z) = K{Z) + H{Z) (11) 

This expression minimizes complexity and maximizes uncertainty. Typically, 

. the total information is minimized by the Kolmogorov complexity|6] and these 

(3) quantities are within a few bits oi K — Y + H^. The relationship between 

C^l K, Y, and H may be characterized in terms of input to a computer program. 

4_j The effective complexity Y represents a fixed deterministic algorithm, and the 

entropy H represents the information content of an arbitrary initial condition 

chosen as input for that algorithm [71. Together, these represent the minimal 

i~' total information content needed for the output of the program. In the absence 

O" ^ of any additional constraints, this is tyically the Kolmogorov complexity K. 

O When macrostates are cylinder sets, which are coarse-grained partitions of 

*^ > phase space, the effective complexity becomes identical to the algorithmic en- 

('i^j) tropy. In contrast to algorithmic entropy, effective complexity applies generally 

^■^^ to any macrostates, which need not be compact volumes of state space and their 

string representations don't generally share a common prefix. Such macrostates 
may represent any set of objects equivalent under an arbitrary relation; how- 
ever, coarse-grained macrostates are a very special case which lead to algorithmic 
entropy and an alternative definition of the prefix complexity. Note that when 
algorithmic entropy is generalized to ensembles of arbitrary measure, it becomes 
equivalent to the effective complexity. 

Finally, we note that the Kolmogorov complexity is not generally calculable 
due to non-halting programs(^, and, moreover, a binary computation system 
is only optimal for representing powers of two. Though non-constructive, Kol- 
mogorov complexity is a useful conceptual device which simplifies the reasoning 
of many proofs, e.g. demonstrating the incompleteness of axiomatic systems or 
the limits of inductive reasoning |15j. 
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1 Macroscopic Equivalence of Microstates 

Macrostates may be described using the simplest representation of an equivalent 
microstate. This measure has an important application to 'lossy' data compres- 
sion, as its objective is to find the simplest representation which is equivalent to 
a more complex datum. This is a function of the particular observer or classifier 
involved, which we may characterize by an equivalence relation, P. 

The simplest string capable of reconstructing an object equivalent to X is 

the simplest definition of an object belonging to the equivalence class X/P; 

since no shorter string appears equivalent to the observer, this code is optimal. 

This establishes a formal theoretical limit for the performance of the so-called 

lossy data compression algorithms prevalent in digital media. Beyond this level, 

information about microscopic state constitutes irrelevant noise. Discarding 

• J— I this irrelevant microscopic data boosts the signal-to-noise ratio perceived by 

Q the macroscopic observer, which facilitates the inference and machine learning 

O of macroscopic signals. Macroscopic equivalence relations arise naturally in the 

'^^ lossy compression of perceptual data - images, audio, and video - as the objective 

r^ of such algorithms may be phrased as a search for shorter representations which 

, '~~| are indistinguishable to an observer represented by class P. 

'■^ The observer or classifier P groups indistinguishable objects into equivalence 

classes, with a finite but large number of objects falling into each equivalence 

' class. Consider the equivalence relation P on the set of strings as a function 

■'""' which maps string representations of microstates to observable macrostates. 

, — ^ X is indistinguishable from Y if and only if microscopic states X and Y are 

0;| congruent modulo the equivalence class P. 

. -, A string X represents the microstate of an object or ensemble, and its equiv- 

, ^ alence class X/P is the macrostate of the object/ensemble as observed under 

_ ^J P. For the purposes of this paper, set membership in class X/P is formally 

J— I presumed to be determinable by an Oracle for P - a Turing machine may ask 

f^"' the Oracle a true/false question to determine set membership in X/P in a sin- 

Q gle operation. In general, P may take any form. The equivalence relation may 

\) be endowed with arbitrary criteria so long as these criteria provide consistent 

©classification. The canonical example of classical thermodynamics involves mea- 
surement at a particular scale, resulting in P which partitions the phase space 
of the system at a characteristic length scale. P may specify, for example, a 
neural net or other classifier, time scales, statistics from human observations, or 
other factors. 

2 The Complexity of an Equivalence Class 

We introduce a new complexity metric for an object X, the Kolmolgorov com- 
plexity of the simplest object equivalent to X under the relation P(). Sf{X/P) 
is a measure of the descriptive complexity of an equivalence class of macroscopic 
objects. 

SfiX/P) = min Kf{Y) (12) 

■' YeP(X) ■' 



We refer to 5/ [X/P) as the complexity of a macroscopic state P, the macrostate 
complexity, or simply the niacrocomplexity. These are macrostates in the sense 
of classical thermodynamics; as such, the logarithm of their cardinality is the 
Boltzmann entropy S. Sf{X/P) is the minimum Kolmolgorov complexity of 
any string equivalent to X, the length of the shortest computer program which 
terminates with output in the class P(X). K(Y), then, could also be used as a 
minimal description of the macroscopic equivalence class P(X). K(Y) represents 
the shortest description macroscopically equivalent to X, which is the optimal 
information-losing ('lossy') data compression of string X (on computer f). 

In contrast to the Kolmolgorov Structure Function, which produces the min- 
imal Sufficient Statistic as a superset of x given the desired complexity of this 
set, the macrocomplexity is a function of x and its superset P(X). 

To simplify the expression, we substitute the definition of Kf{X) into the 
■ r"' definition of S(X/P), which reduces to: 

3 5^(X/F) ^ min IpI (13) 

jj-y f(p)eP{x) 

'~^ This looks similar to the definition of the Kolmogorov complexity, but the 

.— ^ equality in the argument has been replaced by an equivalence. The macrocom- 

plexity Sf {X/P) is a function of the microstate X and equivalence relation P, 
in contrast to Kolmogorov's C-complexity, which depends only on X. Clearly, 
S < C. In fact, the difference between the macrocomplexity Sf{X/P) and the 
C-Complexity of a typical state is close to Boltzmann's entropy function. 
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Boltzmann Entropy and Optimal Information- 
Losing Codes 



r~^ The entropy of multimedia data is typically high, its string representations 

Pi^ are irregular and nearly incompressible by universal (lossless) data compression 

O algorithms [131 [TSl HOI HI], so the output of such algorithms is not significantly 

*^ > shorter than the original data. An effective lossy compression algorithm, on 

('i^j) the other hand, minimizes description length within an equivalence class whose 

^■^^ elements are indistinguishable to a macroscopic observer or other equivalence 

class P, which may allow significant savings. 

As a concrete example, consider lossy MPEG Level 3 (MP3) audio com- 
pression, which typically provides higher levels of compression of music than 
the universal Lempel-Ziv (141 125] compression algorithm. MPS frequently com- 
presses music recordings 90%, whereas Lempel-Ziv's compression ratio of raw 
music data is often close to zero. The reason such an improvement is possible, 
given entropic coding limits, is that the human nervous system discards large 
amounts of irrelevant perceptual dataHU [U]. As a result, the classes of objects 
which are indistinguishable to humans often have many members, which, in 
turn, leads to the existence of shorter descriptions. By refining this notion, we 
will elucidate the role of Boltzmann entropy functions in macroscopic observa- 
tion and lossy data compression. 
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For media such as audio or video which mimic the sensory channels of a 
macroscopic human observer, the amount of regularity or redundancy is often 
low in comparison to the length of the string. Not much compression is possible, 
so Cf{X) K \X\. In this case, X is regarded as mostly random or chaotic[31 
[T5] . Regardless of the string X, the size of the class X/P naturally affects 
the existence of simpler equivalent descriptions. If the criteria for P are very 
restrictive, then it may be that 

Sf{XlP) « C{X) (14) 

That is, any description of a macrostate requires specification of nearly the 
entire microstate. We will focus our attention on the more interesting case, 
when X/P contains simpler microstates equivalent to a typical element X: 



> Sf{XlP) < C{X) (15) 

^^ This is the case when lossy compression is practical, for example, with many 

digital audio and video recordings. In such cases, the Boltzmann entropy of X/P 

■— < is comparable to the difference between the macrocomplexity and K-complexity. 

This may be demonstrated via the universal probability measure. Let us define 

the universal probability of an equivalence class: 

J_^ U{X,P)^ J2 2-IPl (16) 

j—i f{p)ex/p 

(IXI Here the programs p are implied to be self-delimiting prefix codes, and hence 

, -, the relation involves K-complexity rather than C-complexity. This may be 

JZi rewritten as the sum of the individual universal probabilities U{Xi) for each 

_ 2^ string Xi belonging to the class X/P: 
I — I 

>-.. \x/P\ 

U{X,P)=Y,U{X,) (17) 

4=1 

To first order, the universal probability of programs having X as output is 
dominated by the shortest program and may be approximated by 

U{X) « 2-^(^) (18) 

The universal probability of programs congruent to X/P may be expressed 
as 

L/(X,P)«2"^W-f') (19) 

The relative frequency of programs with output in the class X/P over pro- 
grams whose output is X, then, is the ratio of these two measures. The universal 
probability of microstate X given that X is in X/P becomes 

JJ(X\XIP\ ^^^^ ^(-^) .201 



or, taking the leading terms in each series, we have, to first order, 

U{X\X/P) « 2^W^)-^(^) (21) 

In a classical statistical ensemble, each of the \X/P\ microstates of the system 
are equally likely, with probability txJp\- These probabilities do not directly 
correspond to the universal probabilities. The latter are the probabilities of 
obtaining a string as the output of a random program on a certain Turing ma- 
chine, and the former are simply the probabilities directly implied by the length 
of the string. Directly equating the universal probability and the likelihood is 
not appropriate. 

However, we may characterize an typical element X whose universal proba- 
OJ bility is close to its mean value of ij^ipi ■ For such a typical element: 

"> U(X\X!P\- ^(^^ - ^^^) - ^ r22) 

Substituting, we see that the cardinality of the macrostate obeys, to first 
■^ order, 

C/(X|X/P) = ^«2^W^)-^W (23) 

After taking logarithms and inverting the sign, we obtain a relation involving 
the the Boltzmann entropy of the macrostate, which is the logarithm of the set 
cardinality: 

S = log \X/P\ « K{X) - S{X/P) (24) 

. -, The Boltzmann entropy of the macrostate is seen to be the difference be- 

' I tween the prefix complexity K and the macrocomplcxity S{X/ P) for a typical 

_ ^JJ element of X/ P which occurs with approximately average probability. Entropy 

p-i represents the additional information needed to specify a typical microstate, of 

'r^-' complexity K{X), provided the description of its macrostate having complexity 

Q S{X/P). This holds for 'shallow' objects, where the universal probability is 

rj dominated by the shortest program, but need not be the case for 'deep' objects 

©which reveal their structure in a slow convergence to the universal probability. 
If the system has multiple macrostates, rather than a single equilibrium 
macrostate, then a different probability measure may apply. The uniform re- 
cursive probability measure for strings of length |X|, ji = 2^1^^!, implies: 

\X/P\ _ 2^(x)-s(x/p)-\x\ ^25) 



o 
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This measure effectively shifts the discrete Boltzmann entropy by a normal- 
ization constant \X\: 

S = log Wl ^ K{X) S{X/P) \X\ (26) 

This form is related to universal randomness tests[5j[T5]. To first order, the 
sum of the macrocomplcxity and Boltzmann entropy (which is also the total 
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information E) may be expressed as the Martin-L6f universal randomness test 

K{X)-\X\: 

E = S* + S{X/P) « K{X) - \X\ (27) 

Such relationships are known to relate algorithmic entropy[TS] and prefix 
complexity, which may be expressed as a special case of macrocomplexity. The 
macrocomplexity provides statistical and thermodynamic bounds for the opti- 
mal performance of lossy data compression, just as Shannon information limits 
exact universal compression. Furthermore, since the macrocomplexity is the 
Kolmogorov complexity of an equivalent element, it may be used (or approxi- 
mated) for minimum description length inference in problems of pattern recog- 
nition and artificial intelligence. 



> 4 Effective Complexity and Algorithmic Entropy 

cj of Macrostates 



^H Like many complexity measures [7], macrocomplexity is closely related to ef- 

— I fective complexity and algorithmic entropy [15]. These measures agree, under 

O appropriate conditions, with macrocomplexity as the best information-losing 

data compression of a string X as judged by a classifier P. 

1 The entropy of an unconstrained, discrete set is the logarithm or Boltzmann 

1 — I entropy function S = log \X/P\ of the macrostate, so the total information 

' ' becomes: 

g E = y -f H{XIP) = y + log \X/P\ (28) 

4—^ Where Y is the K-complexity of the macrostate X/P. For the case of a 

'-r^l, typical element X, the total information E is close (within a few bits [7]) to the 

. Xh"' K-complexity X. 

b ^ K{X)^Y + \og\X/P\ (29) 

^ This level of effective complexity is typical of the equivalence class X/P. 

r j Hence, for typical elements, the effective complexity is: 

Y K. K{X) - log \X/P\ (30) 



This is approximately equal to the first-order approximation to the macrostate 
complexity obtained in the previous section: 

S{XIP) « K{X) - log \XIP\ (31) 

So, Y K, S{X/P) is the complexity typically perceived by an observer or 
some other classifier described by the macrostate P. The correspondence may 
be seen to hold more generally by considering macroscopic equivalence in terms 
of Turing equivalence. The S-complexity may be alternately defined using the 
complexity of a computer-observer system. In this context, the entropy plays 
the role of a constant which relates the complexity of programs on computer 
f to those of a Turing-equivalent computer-observer system, g. Specifically, g 
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applies to its input program the instructions of computer f followed by the 
mapping to the equivalence class P(X), that is, g{) = P(/()). This mapping 
loses information, so the complexities obey: 

KfiX)^KgiPiX)) + C (32) 

As we have seen, for typical elements, the additive Turing equivalence con- 
stant, C, is approximately the Boltzmann entropy S. It represents the amount 
of Kolmolgorov complexity lost by the computer-observer system, g — P{f), as 
compared to the standard computer, /. This allows an alternative definition of 
macrocomplexity - the K-complexity of a macroscopic equivalence class P{X) 
on the computer-observer system g: 

Sf{X,P)^Kg{P{X)) (33) 



f> The macrostate complexity, originally defined in terms of a standard com- 

>^j puter and an observer, is now the complexity of the macrostate on a computer 

^/^ system which incorporates the observer. This is an effective complexity, the 

^_i K-complexity of P(X). In this way, we split the total information Kf{X) into 

'Z^ an effective complexity Kg{X) and a Turing equivalence constant C, a function 

of P which plays the role of the entropy. 

Equivalently, macrocomplexity may be regarded as a generalization of algo- 
rithmic entropy. Endowing the algorithmic entropy with an arbitrary metric 
generalizes the prefix complexity [15] to macrostates which do not necessarily 
share common prefixes; in this case, macrocomplexity arises from the algorith- 
mic entropy of arbitrary sets having uniform metrics. 



5 Calculating the Complexity of a Macrostate 



r~^ The first step in a consideration of macrocomplexity is to specify the observer 

O" ^ or classifier P. Once this is done, a calculation or estimate of complexity may be 

O desired. Kolmogorov complexities are not generally calculable !5], so unless the 

*^ > class X/P contains objects with short string representations, exact calculation 

^'^jS of Sf{X/P) could be impossible. Even if one discounts the halting problem 

— and uses the finitely calculable resource- limited complexities [15], the number of 

enumerable strings to consider is potentially daunting. 

Practical approximation of Sf{X/P), however, may be fairly simple, given 
one or more lossy compression algorithms offering good performance as judged 
by the classifier P. Estimation of Sf{X/P) in this case amounts to tuning the 
lossy algorithms to minimize length without perceptible loss, as determined by 
the relation P. Just as universal data compression algorithms such as Lempel- 
Ziv|14| may be used to estimate the algorithmic prefix complexity, K, existing 
lossy data compression algorithms allow a quick estimate of certain macrostate 
complexities. These macrostate complexities may then be used to construct a 
mutual information function, universal probabilities, or other statistics. 

Of course, if the cardinality of the macrostate, \X/P\, is known or es- 
timable, then Sf{X/P) may be approximated using results obtained relating the 

11 
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macrocomplexity to the Boltzmann entropy. Conversely, knowledge of Sf {X/P) 
may be used to estimate the cardinality or Boltzmann entropy of an unknown 
macrostate (with a known equivalence relation) whose cardinality might other- 
wise be difficult or impossible to count. 

6 Classification of Macrostates 

Given the ability to calculate or approximate Sf{X/P), the minimal descriptive 
complexity of an element of the equivalence class X/P, we may use it for the 
purpose of classification. We may define a conditional complexity by directly 
substituting macrocomplexity in place of K-complexity: 

S{{A\B)/P) = S{AB/P) - S{B/P) (34) 



O Effectively, this is K{A\B) modulo P, and it represents the complexity of 

^yi differences between A and B which persist even after passing through the clas- 

sifier P. The combination AB is typically chosen in a way that preserves locality 
■— < under P. For practical resource-limited estimation, AB should be constructed 

such that corresponding structural elements of the objects A and B are 'close' 
in some sense of the resulting representation. 
. Ideally, a similarity measure D{x, y) would have the properties of a distance 

metric: D{x,y) > 0, D{x,y) — D{y,x), and D{x,y) + D{y, z) < D{x,z). One 

, I way to accomplish this would be to symmetrize by addition 15 to S{{A\B)/P) + 

CD S{{B\A)/P) which results in the macroscopic complexity's equivalent of a mu- 

'^l tual information function[31 [TS] modulo the relation P. However, this com- 

-+— ^ bination is not necessarily the desired minimal distance function, as there is 

'TiTn generally some redundancy between these two quantities. On the other hand, 

• ir-i"' the 'max-distance' El — inax{C{A\B),C{B\A)) is minimal among all such 

distances [15], up to an additive constant. In terms of macrocomplexity, this 

is: D{A,B)=inax{Si{A\B)/P),S{{B\A)/P)} 

JQ^ This is the minimum amount of additional data that must be specified to 

*^ — ' transform A into an element of P(B) or B into an element of P(A), with the 

(zj) optimal transformation being in one of these two directions. This quantifies 

the similarity of any two macroscopic objects and provides a natural framework 

for the classification of macrostates. In this framework, classification problems 

reduce to minimizing a sort of universal invariant distance from X to the class 

Dp^iX)=unnDiX,Y) (35) 

This is evaluated against all macrostates in P to identify the closest macrostate. 
Pi, to a string whose equivalence class X/P is undefined or unknown: 

Class{X) = arg min Dp^ (X) (36) 

For example, the recognition of recorded speech as particular words might 
evaluate an unknown audio sample against recordings indexed in a dictionary. 
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Such patterns may be specified, as would typically be the case with a dictionary, 
or they may be defined based on proximity in equivalence distance. 

7 Comments and Discussion 

The extraction of meaningful information has always been a problem in the ma- 
chine recognition of human sensory input. Prior to filtering by neural perceptual 
classifiers, such input is mostly random, incompressible, noisy and chaotic. The 
perception of useful information in such a signal involves removal of irrelevant 
noise in order to recognize compressible and learnable patterns. Lossy data 
compression schemes, by their nature, do this, and, coupled with a macro- 
scopic equivalence relation, allow the practical estimation of macrocomplexity 
in some cases. As lossy compression algorithms improve, so will approximations 
of macrocomplexity, which will improve the quality of pattern recognition. 
As evidenced by the success of lossy perceptual audio encoding, psychoacou- 
CO sics has become a relatively mature science. When perceptual equivalence under 

pH P amounts to indistinguishability to a typical human ear, these psychoacoustic 

r-| models provide effective lossy data compression. The resulting macrocomplexity 

O may be used to perform auditory inference by proximity in equivalence distance. 

' ^ Spoken language is richer in information than text, but the difficulty of 

I extracting this information has historically limited its utility in analysis. For 

1 — I example, in the case of written human language, the transcription of audio data 

JT^ to symbolic data obviously loses large amounts of information about cues such 

'^, as inflection, tone, and timing. One could define macroscopic perceptual classes 

based on some semantic equivalence, e.g. X/ P could represent recordings of a 
I — I particular word. Psychoacoustics models, however, properly describe indistin- 

bi) guishability of sounds to the ear of a hypothetical listener rather than this sort 

^ of higher-order linguistic processing. 

rT-' As a trivial example of how macrocomplexity is relevant to inference, con- 

pT' sider the filtering of human speech recorded in a noisy environment. If the 

^ ~\ original recording is, for example, a 48kHz channel recorded on an idealized 

©microphone, then most of its data points are irrelevant to the capture of the hu- 
man voice, whose frequency response does not exceed some maximum frequency 
threshold, typically around 3kHz. The speech frequency band constitutes a psy- 
choacoustic model for an observer, P, ignorant of the higher frequencies. As 
such, a crude perceptual coding might simply perform a Fourier transform and 
discard all frequencies in the spectrum above (or below) the audible threshold. 
Entropic compression may then be used to estimate complexities or information 
distances using either the filtered spectrum or it's inverse Fourier transform, the 
filtered signal. In addition to reconstructing the signal using less information, 
this improves inference regarding speech, as higher frequency components are 
not germane to speech analysis. The result is an estimate of macrocomplexity 
for the specified P. 

The lossy compression of images and video is a more difficult problem than 
audio, as visual processing by humans is not so well understood as psychoacous- 
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tics, and because images often represent more information than audio record- 
ings. Modern image compression is largely based on wavelets through the 
JPEG2000 standard[T], which remains the dominant medium for transmitted 
image data. Recent video codecs such as h.264 video and MPEG-7 audio have 
been developed using more realistic psychological mo dels [S] [30]. These algo- 
rithms offer more effective perceptual coding, greater compression, and hence 
more accurate estimates of macrocomplexity as compared to their predeces- 
sors. As one might expect based on the discussion here, such algorithms have 
improved utility in indexing and retrieval applications [2]. 

Due to the breadth of other definitions of equivalence classes, the notion of 
splitting microscopic complexity into a macroscopic observer and a macroscopic 
observation has many possible implications. Macrostate complexity may only 
be rigorously defined insofar as the macroscopic equivalence relation P may 
be well-posed. A rigorous and exact calculation of macrocomplexity, like the 
Kolmogorov complexity of its microstate, is not possible beyond trivial cases. 
However, for observers similar to those modeled by existing lossy data compres- 
sion algorithms, the use of such a measure enables pattern recognition which 
can exceed the limits posed by classical information theory. 
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